Reviewer's comment v1
Hello Yonas, this is Gerardo Flores 👋
You can find me on Discord as gerardo.flores.ds
I'm happy to review your project today 🙌
You can find my comments under the heading «Review». I will categorize my comments in green, blue or red boxes like this:
Please don't remove my comments. If you have any questions, don't hesitate to respond to my comments in a different section.
General Feedback
You’ve submitted another project—great work! Your commitment to pushing through the challenges of this program is admirable.
After reviewing your submission, I consider it should be approved. You can find my more detailed feedback within your project file in the Reviewer's comment v1 section.
And of course, if you have any questions along the way, remember that you can always reach out to your tutor for any clarification.
Video Game Sales Analysis Project¶
Project Overview¶
In this project, you'll analyze video game sales data to identify patterns that determine a game's success. Working as an analyst for the online store Ice, you'll use this information to help plan future advertising campaigns.
Environment Setup and Required Libraries¶
Reviewer's comment v1
Great job adding additional information about the project goal. It would also be helpful to share the steps you plan to complete.
# Import all required libraries
import pandas as pd
import numpy as np
import math as mt
import plotly.express as px
from scipy import stats as st
Step 1: Loading and Initial Data Exploration¶
First, let's load our dataset and examine its basic properties:
# Load the dataset
data = pd.read_csv('/datasets/games.csv')
# Display basic information about the dataset
display(data)
print()
data.info()
print()
data.describe()
| Name | Platform | Year_of_Release | Genre | NA_sales | EU_sales | JP_sales | Other_sales | Critic_Score | User_Score | Rating | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Wii Sports | Wii | 2006.0 | Sports | 41.36 | 28.96 | 3.77 | 8.45 | 76.0 | 8 | E |
| 1 | Super Mario Bros. | NES | 1985.0 | Platform | 29.08 | 3.58 | 6.81 | 0.77 | NaN | NaN | NaN |
| 2 | Mario Kart Wii | Wii | 2008.0 | Racing | 15.68 | 12.76 | 3.79 | 3.29 | 82.0 | 8.3 | E |
| 3 | Wii Sports Resort | Wii | 2009.0 | Sports | 15.61 | 10.93 | 3.28 | 2.95 | 80.0 | 8 | E |
| 4 | Pokemon Red/Pokemon Blue | GB | 1996.0 | Role-Playing | 11.27 | 8.89 | 10.22 | 1.00 | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 16710 | Samurai Warriors: Sanada Maru | PS3 | 2016.0 | Action | 0.00 | 0.00 | 0.01 | 0.00 | NaN | NaN | NaN |
| 16711 | LMA Manager 2007 | X360 | 2006.0 | Sports | 0.00 | 0.01 | 0.00 | 0.00 | NaN | NaN | NaN |
| 16712 | Haitaka no Psychedelica | PSV | 2016.0 | Adventure | 0.00 | 0.00 | 0.01 | 0.00 | NaN | NaN | NaN |
| 16713 | Spirits & Spells | GBA | 2003.0 | Platform | 0.01 | 0.00 | 0.00 | 0.00 | NaN | NaN | NaN |
| 16714 | Winning Post 8 2016 | PSV | 2016.0 | Simulation | 0.00 | 0.00 | 0.01 | 0.00 | NaN | NaN | NaN |
16715 rows × 11 columns
<class 'pandas.core.frame.DataFrame'> RangeIndex: 16715 entries, 0 to 16714 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 16713 non-null object 1 Platform 16715 non-null object 2 Year_of_Release 16446 non-null float64 3 Genre 16713 non-null object 4 NA_sales 16715 non-null float64 5 EU_sales 16715 non-null float64 6 JP_sales 16715 non-null float64 7 Other_sales 16715 non-null float64 8 Critic_Score 8137 non-null float64 9 User_Score 10014 non-null object 10 Rating 9949 non-null object dtypes: float64(6), object(5) memory usage: 1.4+ MB
| Year_of_Release | NA_sales | EU_sales | JP_sales | Other_sales | Critic_Score | |
|---|---|---|---|---|---|---|
| count | 16446.000000 | 16715.000000 | 16715.000000 | 16715.000000 | 16715.000000 | 8137.000000 |
| mean | 2006.484616 | 0.263377 | 0.145060 | 0.077617 | 0.047342 | 68.967679 |
| std | 5.877050 | 0.813604 | 0.503339 | 0.308853 | 0.186731 | 13.938165 |
| min | 1980.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 13.000000 |
| 25% | 2003.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 60.000000 |
| 50% | 2007.000000 | 0.080000 | 0.020000 | 0.000000 | 0.010000 | 71.000000 |
| 75% | 2010.000000 | 0.240000 | 0.110000 | 0.040000 | 0.030000 | 79.000000 |
| max | 2016.000000 | 41.360000 | 28.960000 | 10.220000 | 10.570000 | 98.000000 |
Since year_of_release is only missing 269 entries (1.6% of entries), and the year is an important metric in analysis, I will be substituting the median for missing values under said column. When it comes time for score and rating analysis I will be omitting entries that have missing values in this column, a significant number of them are missing, and substituting may skew our results.
# Check for duplicate entries
data.duplicated().sum()
#display(data[data['User_Score'] == 'tbd'])
0
Key Questions to Answer:¶
- What's the total number of records in our dataset?
- What data types are present in each column?
- Are there any obvious issues with the data?
- Do we see any immediate patterns or anomalies?
We have 16,715 entries, with 11 columns. Mostly float64 dtype, with the expected object type for Name, Platform, Genre, and Rating. We also have 'tbd' lurking within the user_score column, causing it to become object dtype.
Reviewer's comment v1
Well done!
# Convert column names to lowercase
data.columns = data.columns.str.lower()
# Verify the changes
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 16715 entries, 0 to 16714 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 16713 non-null object 1 platform 16715 non-null object 2 year_of_release 16446 non-null float64 3 genre 16713 non-null object 4 na_sales 16715 non-null float64 5 eu_sales 16715 non-null float64 6 jp_sales 16715 non-null float64 7 other_sales 16715 non-null float64 8 critic_score 8137 non-null float64 9 user_score 10014 non-null object 10 rating 9949 non-null object dtypes: float64(6), object(5) memory usage: 1.4+ MB
column names successfully converted to lowercase
2.2 Data Type Conversion¶
# Check current data types
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 16715 entries, 0 to 16714 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 16713 non-null object 1 platform 16715 non-null object 2 year_of_release 16446 non-null float64 3 genre 16713 non-null object 4 na_sales 16715 non-null float64 5 eu_sales 16715 non-null float64 6 jp_sales 16715 non-null float64 7 other_sales 16715 non-null float64 8 critic_score 8137 non-null float64 9 user_score 10014 non-null object 10 rating 9949 non-null object dtypes: float64(6), object(5) memory usage: 1.4+ MB
# Make changes to data types if necessary
# Describe the columns where the data types have been changed and why.
data['year_of_release'] = data['year_of_release'].fillna(2007)
data['year_of_release'] = data['year_of_release'].astype(int)
data.info()
#display(min(data['year_of_release']), max(data['year_of_release']))
#display(data)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 16715 entries, 0 to 16714 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 16713 non-null object 1 platform 16715 non-null object 2 year_of_release 16715 non-null int64 3 genre 16713 non-null object 4 na_sales 16715 non-null float64 5 eu_sales 16715 non-null float64 6 jp_sales 16715 non-null float64 7 other_sales 16715 non-null float64 8 critic_score 8137 non-null float64 9 user_score 10014 non-null object 10 rating 9949 non-null object dtypes: float64(5), int64(1), object(5) memory usage: 1.4+ MB
I've decided to fill in the missing year values with the median, in order to convert the column to int type since we're only looking at the year, and not month/day. With the range being from 1980 - 2016
# Pay attention to the abbreviation TBD (to be determined). Specify how you intend to handle such cases.
data['user_score'] = data['user_score'].replace('tbd', np.nan)
data['user_score'] = data['user_score'].astype(float)
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 16715 entries, 0 to 16714 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 16713 non-null object 1 platform 16715 non-null object 2 year_of_release 16715 non-null int64 3 genre 16713 non-null object 4 na_sales 16715 non-null float64 5 eu_sales 16715 non-null float64 6 jp_sales 16715 non-null float64 7 other_sales 16715 non-null float64 8 critic_score 8137 non-null float64 9 user_score 7590 non-null float64 10 rating 9949 non-null object dtypes: float64(6), int64(1), object(4) memory usage: 1.4+ MB
The 'tbd' placeholder was preventing 'user_score' from being the correct dtype needed for future calculations. By changing these values to 'NaN', we can now make the proper dtype conversion, which will allow non-null entries to be used for calculations.
2.3 Handling Missing Values¶
# Examine missing values
display(data[data['name'].isna()])
print()
#display(data[data['user_score'].isna() & data['critic_score'].isna() & data['rating'].isna()])
| name | platform | year_of_release | genre | na_sales | eu_sales | jp_sales | other_sales | critic_score | user_score | rating | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 659 | NaN | GEN | 1993 | NaN | 1.78 | 0.53 | 0.00 | 0.08 | NaN | NaN | NaN |
| 14244 | NaN | GEN | 1993 | NaN | 0.00 | 0.00 | 0.03 | 0.00 | NaN | NaN | NaN |
We have many missing values within the critic_score, user_score, and rating columns. With only 2 entries who don't have a name or genre.
# Calculate percentage of missing values
ng_missing = (len(data[data['name'].isna()]) / len(data)) * 100
critic_missing = (len(data[data['critic_score'].isna()]) / len(data)) * 100
user_missing = (len(data[data['user_score'].isna()]) / len(data)) * 100
rating_missing = (len(data[data['rating'].isna()]) / len(data)) * 100
print(f'Percentage of entries missing "name" & "genre": {ng_missing}%')
print(f'Percentage of entries missing "critic_score": {critic_missing}%')
print(f'Percentage of entries missing "user_score": {user_missing}%')
print(f'Percentage of entries missing "rating": {rating_missing}%')
Percentage of entries missing "name" & "genre": 0.011965300628178284% Percentage of entries missing "critic_score": 51.31917439425665% Percentage of entries missing "user_score": 54.59168411606342% Percentage of entries missing "rating": 40.478612025127134%
# Analyze patterns in missing values
#display(data[data['year_of_release'] < 1990])
#display(data[(data['year_of_release'] < 2000) & (data['platform'] == 'PS')])
display(data[data['year_of_release'] > 2010])
| name | platform | year_of_release | genre | na_sales | eu_sales | jp_sales | other_sales | critic_score | user_score | rating | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 16 | Grand Theft Auto V | PS3 | 2013 | Action | 7.02 | 9.09 | 0.98 | 3.96 | 97.0 | 8.2 | M |
| 23 | Grand Theft Auto V | X360 | 2013 | Action | 9.66 | 5.14 | 0.06 | 1.41 | 97.0 | 8.1 | M |
| 29 | Call of Duty: Modern Warfare 3 | X360 | 2011 | Shooter | 9.04 | 4.24 | 0.13 | 1.32 | 88.0 | 3.4 | M |
| 31 | Call of Duty: Black Ops 3 | PS4 | 2015 | Shooter | 6.03 | 5.86 | 0.36 | 2.38 | NaN | NaN | NaN |
| 33 | Pokemon X/Pokemon Y | 3DS | 2013 | Role-Playing | 5.28 | 4.19 | 4.35 | 0.78 | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 16703 | Strawberry Nauts | PSV | 2016 | Adventure | 0.00 | 0.00 | 0.01 | 0.00 | NaN | NaN | NaN |
| 16707 | Aiyoku no Eustia | PSV | 2014 | Misc | 0.00 | 0.00 | 0.01 | 0.00 | NaN | NaN | NaN |
| 16710 | Samurai Warriors: Sanada Maru | PS3 | 2016 | Action | 0.00 | 0.00 | 0.01 | 0.00 | NaN | NaN | NaN |
| 16712 | Haitaka no Psychedelica | PSV | 2016 | Adventure | 0.00 | 0.00 | 0.01 | 0.00 | NaN | NaN | NaN |
| 16714 | Winning Post 8 2016 | PSV | 2016 | Simulation | 0.00 | 0.00 | 0.01 | 0.00 | NaN | NaN | NaN |
4022 rows × 11 columns
After running multiple parameters, I cannot seem to find a pattern in the entries with missing values. My initial guess was maybe the rating system was developed at a certain time period, perhaps pre 2000, but there exist entries with a non-null user or critic score pre 2000, even as far back as the 1980's.
Another theory I had was that maybe games only get ratings once per title, regardless of if that title released on multiple platforms (i.e. GTA V being released on X360 & PS3), but there exist entries that prove this theory false.
# Handle missing values based on analysis
# Your code here to handle missing values according to your strategy
data.dropna(subset=['name'], inplace=True)
data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 16713 entries, 0 to 16714 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 16713 non-null object 1 platform 16713 non-null object 2 year_of_release 16713 non-null int64 3 genre 16713 non-null object 4 na_sales 16713 non-null float64 5 eu_sales 16713 non-null float64 6 jp_sales 16713 non-null float64 7 other_sales 16713 non-null float64 8 critic_score 8137 non-null float64 9 user_score 7590 non-null float64 10 rating 9949 non-null object dtypes: float64(6), int64(1), object(4) memory usage: 1.5+ MB
I've decided to remove the 2 entries missing 'name' & 'platform', since their sales were so small they won't affect our analysis. These 2 entries account for .01% of our entire data set.
Earlier in this analysis I substituted the missing values under 'year_of_release' with the median (2007), I feel confident in this considering the mean is (2006.48)
# Why do you think the values are missing? Give possible reasons.
# Explain why you filled in the missing values as you did or why you decided to leave them blank.
Percentage of entries missing "critic_score": 51.31917439425665%
Percentage of entries missing "user_score": 54.59168411606342%
Percentage of entries missing "rating": 40.478612025127134%
With the quantity of games here, and the subjectivity of scores, it's possible that a score could either not be settled upon when gathering this data (which would explain the 1985 Super Mario Bro's), or some of these titles didn't sell enough to be considered for a score. The reality is likely a combination of both, but with the sheer amount of data missing, I believe the best course of action would be to leave these blank, and create new dataframes containing the valid entries when it comes time to analyze the scores and rating.
Reviewer's comment v1
Well done!
2.4 Calculate Total Sales¶
# Calculate total sales across all regions and put them in a different column
data['total_sales'] = data['na_sales'] + data['eu_sales'] + data['jp_sales'] + data['other_sales']
data.info()
display(data.sample(30, random_state=999))
<class 'pandas.core.frame.DataFrame'> Int64Index: 16713 entries, 0 to 16714 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 16713 non-null object 1 platform 16713 non-null object 2 year_of_release 16713 non-null int64 3 genre 16713 non-null object 4 na_sales 16713 non-null float64 5 eu_sales 16713 non-null float64 6 jp_sales 16713 non-null float64 7 other_sales 16713 non-null float64 8 critic_score 8137 non-null float64 9 user_score 7590 non-null float64 10 rating 9949 non-null object 11 total_sales 16713 non-null float64 dtypes: float64(7), int64(1), object(4) memory usage: 1.7+ MB
| name | platform | year_of_release | genre | na_sales | eu_sales | jp_sales | other_sales | critic_score | user_score | rating | total_sales | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3256 | Prince of Persia: The Forgotten Sands | X360 | 2010 | Action | 0.30 | 0.26 | 0.00 | 0.06 | 74.0 | 7.8 | T | 0.62 |
| 10572 | Football Director DS | DS | 2008 | Sports | 0.00 | 0.10 | 0.00 | 0.00 | NaN | NaN | NaN | 0.10 |
| 9920 | Spider-Man: Edge of Time | 3DS | 2011 | Action | 0.08 | 0.03 | 0.00 | 0.01 | 50.0 | 6.4 | T | 0.12 |
| 10204 | Snowboard Kids 2 | N64 | 1999 | Sports | 0.09 | 0.02 | 0.00 | 0.00 | NaN | NaN | NaN | 0.11 |
| 13602 | Record of Agarest War: Marriage | PSP | 2012 | Role-Playing | 0.00 | 0.00 | 0.04 | 0.00 | NaN | NaN | NaN | 0.04 |
| 3156 | Real Heroes: Firefighter | Wii | 2009 | Action | 0.56 | 0.04 | 0.00 | 0.05 | 67.0 | 8.2 | E10+ | 0.65 |
| 15800 | Canada Hunt | Wii | 2010 | Sports | 0.02 | 0.00 | 0.00 | 0.00 | NaN | NaN | T | 0.02 |
| 4762 | Tak and the Power of Juju | GC | 2003 | Platform | 0.31 | 0.08 | 0.00 | 0.01 | 71.0 | 8.2 | E | 0.40 |
| 8898 | High Velocity Bowling | PS3 | 2010 | Sports | 0.13 | 0.00 | 0.00 | 0.01 | 66.0 | 8.1 | E10+ | 0.14 |
| 13190 | Harukanaru Toki no Naka de 3 | PS2 | 2004 | Adventure | 0.00 | 0.00 | 0.05 | 0.00 | NaN | NaN | NaN | 0.05 |
| 5243 | Formula 1 Championship Edition | PS | 1997 | Racing | 0.06 | 0.04 | 0.23 | 0.02 | NaN | NaN | NaN | 0.35 |
| 16326 | Victory Spike | PS | 1996 | Sports | 0.00 | 0.00 | 0.01 | 0.00 | NaN | NaN | NaN | 0.01 |
| 5991 | Imagine: Fashion Designer World Tour | DS | 2009 | Simulation | 0.27 | 0.00 | 0.00 | 0.02 | NaN | NaN | E | 0.29 |
| 8472 | TransFormers Animated: The Game | DS | 2008 | Action | 0.15 | 0.00 | 0.00 | 0.01 | 69.0 | NaN | E | 0.16 |
| 11997 | The Legend of Heroes VII: The Trail of Zero | PSV | 2012 | Role-Playing | 0.00 | 0.00 | 0.07 | 0.00 | NaN | NaN | NaN | 0.07 |
| 9040 | Thor: God of Thunder | X360 | 2011 | Action | 0.09 | 0.04 | 0.00 | 0.01 | 38.0 | 5.7 | T | 0.14 |
| 3672 | Top Gun: Fire at Will! | PS | 1996 | Misc | 0.30 | 0.21 | 0.00 | 0.04 | NaN | NaN | NaN | 0.55 |
| 13876 | Doraemon: Shin Nobita no Nihon Tanjou | 3DS | 2016 | Misc | 0.00 | 0.00 | 0.04 | 0.00 | NaN | NaN | NaN | 0.04 |
| 5080 | Imagine: Party Babyz | Wii | 2008 | Simulation | 0.33 | 0.02 | 0.00 | 0.03 | NaN | 6.1 | E | 0.38 |
| 11266 | Monster Jam: Urban Assault | PSP | 2008 | Racing | 0.08 | 0.00 | 0.00 | 0.01 | NaN | NaN | E | 0.09 |
| 10364 | Pro Yakyuu Team o Tsukurou! 2 | DS | 2009 | Sports | 0.00 | 0.00 | 0.11 | 0.00 | NaN | NaN | NaN | 0.11 |
| 3825 | Tiger Woods PGA Tour 11 | X360 | 2010 | Sports | 0.28 | 0.19 | 0.00 | 0.05 | 79.0 | 7.8 | E | 0.52 |
| 9462 | Mega Man Anniversary Collection | XB | 2005 | Platform | 0.10 | 0.03 | 0.00 | 0.00 | 80.0 | 9.2 | E | 0.13 |
| 9336 | Rogue Warrior | PS3 | 2009 | Shooter | 0.11 | 0.01 | 0.00 | 0.01 | 27.0 | 2.8 | M | 0.13 |
| 2802 | Practise English! | DS | 2007 | Misc | 0.00 | 0.00 | 0.73 | 0.00 | NaN | NaN | NaN | 0.73 |
| 3984 | Star Wars: Clone Wars | PS2 | 2002 | Shooter | 0.24 | 0.19 | 0.00 | 0.06 | NaN | NaN | NaN | 0.49 |
| 2936 | Minecraft: Story Mode | PS4 | 2015 | Adventure | 0.25 | 0.33 | 0.00 | 0.11 | NaN | NaN | NaN | 0.69 |
| 722 | Spyro: Season of Ice | GBA | 2001 | Platform | 1.29 | 0.83 | 0.00 | 0.11 | 74.0 | 8.0 | E | 2.23 |
| 12836 | Nobunaga's Ambition: Sphere of Influence - Sen... | PS4 | 2016 | Misc | 0.01 | 0.00 | 0.05 | 0.00 | NaN | NaN | NaN | 0.06 |
| 7160 | NCAA Football 2005 | GC | 2004 | Sports | 0.17 | 0.04 | 0.00 | 0.01 | 88.0 | 9.0 | E | 0.22 |
# Create a DataFrame with game releases by year
year_data = data[['name', 'platform', 'year_of_release']]
year_data = year_data.sort_values(by='year_of_release')
display(year_data)
year_data.info()
| name | platform | year_of_release | |
|---|---|---|---|
| 1968 | Defender | 2600 | 1980 |
| 6300 | Bridge | 2600 | 1980 |
| 1764 | Kaboom! | 2600 | 1980 |
| 546 | Missile Command | 2600 | 1980 |
| 5360 | Freeway | 2600 | 1980 |
| ... | ... | ... | ... |
| 14404 | Prison Architect | PS4 | 2016 |
| 11168 | Terraria | 3DS | 2016 |
| 14414 | Atelier Shallie Plus: Alchemists of the Dusk Sea | PSV | 2016 |
| 11243 | Assassin's Creed Chronicles | PS4 | 2016 |
| 16714 | Winning Post 8 2016 | PSV | 2016 |
16713 rows × 3 columns
<class 'pandas.core.frame.DataFrame'> Int64Index: 16713 entries, 1968 to 16714 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 16713 non-null object 1 platform 16713 non-null object 2 year_of_release 16713 non-null int64 dtypes: int64(1), object(2) memory usage: 522.3+ KB
# Visualize the distribution of games across years
fig = px.histogram(year_data, x='year_of_release', width=1000, height=600)
fig.update_layout(title_text='Quantity of Games Released by Year',
xaxis_title_text='Year', yaxis_title_text='Quantity')
fig.show()
# Display summary statistics for each year
#Summary Statistics Calculation for the Entire Sample:
def mean_var_stdv(series):
mean = series.mean()
var = series.var()
stdv = mt.sqrt(var)
return f'Mean: {mean}\nVariance: {var}\nSt.Deviation: {stdv}'
print(f'Temporal Summary Statistics for all Observable Years:\n\n{mean_var_stdv(year_data["year_of_release"])}\n')
print(f'-~-~-~-' * 21)
print(f'\nDataFrame Containing Amount of Games Released and Corresponding Percentage, Grouped by Platform and Year:\n')
#Creating new DataFrame containing the amount of titles released each year by platform, as well as the corresponding share percentage (ex. 10 releases in one year / 100 releases across all years = 10% of sales in that year)
total_by_year = year_data.groupby(['platform', 'year_of_release'])['name'].count().reset_index()
total_by_year = total_by_year.rename(columns={'year_of_release' : 'year', 'name' : 'titles_released_count'})
lifetime_total_series = total_by_year.groupby('platform')['titles_released_count'].transform(sum)
total_by_year['platform_lifetime_percentage'] = ((total_by_year['titles_released_count'] / lifetime_total_series) * 100).map('{:.2f}%'.format)
display(total_by_year)
#display(lifetime_total_series)
Temporal Summary Statistics for all Observable Years: Mean: 2006.4945252198886 Variance: 33.970305711920766 St.Deviation: 5.828405074453968 -~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~--~-~-~- DataFrame Containing Amount of Games Released and Corresponding Percentage, Grouped by Platform and Year:
| platform | year | titles_released_count | platform_lifetime_percentage | |
|---|---|---|---|---|
| 0 | 2600 | 1980 | 9 | 6.77% |
| 1 | 2600 | 1981 | 46 | 34.59% |
| 2 | 2600 | 1982 | 36 | 27.07% |
| 3 | 2600 | 1983 | 11 | 8.27% |
| 4 | 2600 | 1984 | 1 | 0.75% |
| ... | ... | ... | ... | ... |
| 239 | XB | 2008 | 1 | 0.12% |
| 240 | XOne | 2013 | 19 | 7.69% |
| 241 | XOne | 2014 | 61 | 24.70% |
| 242 | XOne | 2015 | 80 | 32.39% |
| 243 | XOne | 2016 | 87 | 35.22% |
244 rows × 4 columns
1. Temporal Summary Statistics match what we can see in the previously shown histogram, with the peak lying somewhere between 2006 - 2010. There is a higher volume of data on the left side of our median, causing the mean to be slightly lower than if we had a balanced distribution. This is also proven by the fact that our (-3 sigma)(2006.49 - (5.83*3)) or (1989) is observable, but only our (+1 sigma)(2006.49 + 5.83) or (2012) is visible.
2. Summary Statistics by Platform and Year, this is a DataFrame containing every observable year in our dataset, with the amount of games released on each platform serving as entries. The percentage of each entry is calculated using the total games released per platform per year, divided by the total number of games released over the lifespan of each platform.
Questions to Consider:¶
- Which years show significant numbers of game releases?
- Are there any notable trends or patterns in the number of releases?
- Is there enough recent data to make predictions for 2017?
Our first sharp increase in games released was in the year 2000, seeing a 236.86% increase in the next 2 years (2000 - 2002). Something to consider: "The Matrix" trilogy was in the process of unfolding (1999 - 2003), and with it brought a wider audience into the 'cyberspace', likely contributing to this surge. A bulk of our data lies between the years of 2007 - 2011, seeing a consistent increase in the gaming market. With the introduction of YouTube in 2005, gaming would see a massive market shift, partly explaining the huge increase mentioned previously.
I believe with our data set it is possible to make predictions based on observable trends, however, there is no certainty in the future. Weather predictions are a great example of this.
Reviewer's comment v1
Great job!
3.2 Platform Sales Analysis Over Time¶
Now let's analyze how sales vary across platforms and years:
# Calculate total sales by platform and year
plt_yr_total = data.groupby(['platform', 'year_of_release'])['total_sales'].sum()
plt_yr_total = plt_yr_total.unstack(level='year_of_release')
plt_yr_total = plt_yr_total.fillna(0)
display(plt_yr_total)
plt_yr_total.info()
#display(data['platform'].value_counts())
#year_plt_sales = data[['name', 'platform', 'total_sales', 'year_of_release']]
#year_plt_sales = year_plt_sales.sort_values(by='year_of_release')
#year_plt_sales.info()
#display(year_plt_sales)
| year_of_release | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986 | 1987 | 1988 | 1989 | ... | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| platform | |||||||||||||||||||||
| 2600 | 11.38 | 35.68 | 28.88 | 5.84 | 0.27 | 0.45 | 0.67 | 1.94 | 0.74 | 0.63 | ... | 10.50 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 3DO | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 3DS | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 1.19 | 0.00 | 0.00 | 0.00 | 63.20 | 51.36 | 56.57 | 43.76 | 27.78 | 15.14 |
| DC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.02 | 0.04 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| DS | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 150.28 | 145.31 | 119.54 | 85.02 | 26.18 | 11.01 | 1.54 | 0.00 | 0.00 | 0.00 |
| GB | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.43 | 64.97 | ... | 1.03 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GBA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 8.37 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 2.47 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GEN | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| N64 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.67 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| NES | 0.00 | 0.00 | 0.00 | 10.96 | 50.08 | 53.44 | 36.41 | 19.76 | 45.01 | 7.85 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| NG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| PC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.04 | 0.00 | 0.00 | 0.03 | 0.00 | ... | 13.04 | 12.42 | 16.91 | 24.28 | 35.03 | 23.22 | 12.38 | 13.28 | 8.52 | 5.25 |
| PCFX | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| PS | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 3.28 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| PS2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 98.20 | 53.90 | 26.40 | 5.64 | 0.45 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| PS3 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 81.50 | 118.52 | 130.93 | 142.17 | 156.78 | 107.36 | 113.25 | 47.76 | 16.82 | 3.60 |
| PS4 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 25.99 | 100.00 | 118.90 | 69.25 |
| PSP | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 51.45 | 34.55 | 37.78 | 35.04 | 17.82 | 7.69 | 3.14 | 0.24 | 0.12 | 0.00 |
| PSV | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.26 | 0.00 | 0.00 | 0.00 | 4.63 | 16.19 | 10.59 | 11.90 | 6.25 | 4.25 |
| SAT | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| SCD | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| SNES | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| TG16 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| WS | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Wii | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 169.10 | 171.32 | 206.97 | 127.95 | 59.65 | 21.71 | 8.59 | 3.75 | 1.14 | 0.18 |
| WiiU | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 17.56 | 21.65 | 22.03 | 16.35 | 4.60 |
| X360 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 105.59 | 135.26 | 120.29 | 170.03 | 143.84 | 99.74 | 88.58 | 34.74 | 11.96 | 1.52 |
| XB | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 6.72 | 0.18 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| XOne | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 18.96 | 54.07 | 60.14 | 26.15 |
31 rows × 37 columns
<class 'pandas.core.frame.DataFrame'> Index: 31 entries, 2600 to XOne Data columns (total 37 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 1980 31 non-null float64 1 1981 31 non-null float64 2 1982 31 non-null float64 3 1983 31 non-null float64 4 1984 31 non-null float64 5 1985 31 non-null float64 6 1986 31 non-null float64 7 1987 31 non-null float64 8 1988 31 non-null float64 9 1989 31 non-null float64 10 1990 31 non-null float64 11 1991 31 non-null float64 12 1992 31 non-null float64 13 1993 31 non-null float64 14 1994 31 non-null float64 15 1995 31 non-null float64 16 1996 31 non-null float64 17 1997 31 non-null float64 18 1998 31 non-null float64 19 1999 31 non-null float64 20 2000 31 non-null float64 21 2001 31 non-null float64 22 2002 31 non-null float64 23 2003 31 non-null float64 24 2004 31 non-null float64 25 2005 31 non-null float64 26 2006 31 non-null float64 27 2007 31 non-null float64 28 2008 31 non-null float64 29 2009 31 non-null float64 30 2010 31 non-null float64 31 2011 31 non-null float64 32 2012 31 non-null float64 33 2013 31 non-null float64 34 2014 31 non-null float64 35 2015 31 non-null float64 36 2016 31 non-null float64 dtypes: float64(37) memory usage: 9.2+ KB
# Create a heatmap of platform sales over time
fig = px.imshow(plt_yr_total, labels=dict(x="Year", y="Platform", color="Total Sales"),
x=plt_yr_total.columns, y=plt_yr_total.index, width=1000, height=800,
title='Total Game Sales (USD Million) by Platform and Year')
fig.show()
# Identify platforms with declining sales
Any platform that has been usurped by newer technology will begin to decline, as we can see with the platforms 'PS, PS2, PS3, PS4'. When sales peak and begin to decline, that is when the market seemingly migrates to the next big thing. There are additional "unknown" factors that contribute to sales, example being the announcement of a newer console may influence people to halt consumption and wait for the newer one. Another factor worth consideration is console exclusive titles that may switch certain consumers from one platform to the other, if they do not wish to possess multiple.
Questions to Consider:¶
- Which platforms show consistent sales over time?
- Can you identify platforms that have disappeared from the market?
- What's the typical lifecycle of a gaming platform?
The leading platform in sales consistency has to be 'PC' at 31 years (1985 - 2016), which makes sense since there are no definitive hardware constraints by which to categorize PC from generation to generation.
'GB' comes in second, with consistent sales accross a surprising 13 year span (1988 - 2001). Finally, 'NES' comes in third for consistency in sales over time with 11 years (1983 -1994).
When looking for 'consistency', we want the most stable record over a given time period. Not including platforms that saw meteoric rises before succumbing to their successors (in the case of Playstation's four generations, and multiple handhelds).
Reviewer's comment v1
Great job!
3.3 Determining Relevant Time Period¶
Based on your analysis above, determine the appropriate time period for predicting 2017 sales:
# Your code here to filter the dataset to relevant years
# Example:
# relevant_years = [XXXX, XXXX, XXXX] # Replace with your chosen years
# df_relevant = df[df['year_of_release'].isin(relevant_years)]
# Justify your choice with data
select_years = [2011, 2012, 2013, 2014, 2015, 2016]
data_select_years = data[['platform', 'year_of_release', 'total_sales']]
data_select_years = data_select_years[data_select_years['year_of_release'].isin(select_years)].sort_values(by='year_of_release')
data_select_years = data_select_years.groupby(['platform', 'year_of_release'])['total_sales'].sum().reset_index()
#data_select_years = data_select_years.unstack(level='year_of_release').fillna(0)
display(data_select_years.head(20))
#data_select_years.info()
| platform | year_of_release | total_sales | |
|---|---|---|---|
| 0 | 3DS | 2011 | 63.20 |
| 1 | 3DS | 2012 | 51.36 |
| 2 | 3DS | 2013 | 56.57 |
| 3 | 3DS | 2014 | 43.76 |
| 4 | 3DS | 2015 | 27.78 |
| 5 | 3DS | 2016 | 15.14 |
| 6 | DS | 2011 | 26.18 |
| 7 | DS | 2012 | 11.01 |
| 8 | DS | 2013 | 1.54 |
| 9 | PC | 2011 | 35.03 |
| 10 | PC | 2012 | 23.22 |
| 11 | PC | 2013 | 12.38 |
| 12 | PC | 2014 | 13.28 |
| 13 | PC | 2015 | 8.52 |
| 14 | PC | 2016 | 5.25 |
| 15 | PS2 | 2011 | 0.45 |
| 16 | PS3 | 2011 | 156.78 |
| 17 | PS3 | 2012 | 107.36 |
| 18 | PS3 | 2013 | 113.25 |
| 19 | PS3 | 2014 | 47.76 |
Document Your Decision:¶
- What years did you select and why?
- How does this period reflect current market conditions?
- What factors influenced your decision?
I've decided to go with 2011 - 2016, since it should be easier to make a prediction with a stable trajectory. Kind of like analyzing the influence of gravity on a projectile.
As for current market conditions, we don't know for certain the reason of declining title releases over time. However, we have to acknowledge the trending decline, since titles have to be released in order for sales to occur, which is the primary reason for my time frame selection.
Reviewer's comment v1
Great time frame decision!
3.4 Platform Performance Analysis¶
Using your selected time period, let's analyze platform performance:
# Analyze platform sales trends
display(data_select_years.head(10))
| platform | year_of_release | total_sales | |
|---|---|---|---|
| 0 | 3DS | 2011 | 63.20 |
| 1 | 3DS | 2012 | 51.36 |
| 2 | 3DS | 2013 | 56.57 |
| 3 | 3DS | 2014 | 43.76 |
| 4 | 3DS | 2015 | 27.78 |
| 5 | 3DS | 2016 | 15.14 |
| 6 | DS | 2011 | 26.18 |
| 7 | DS | 2012 | 11.01 |
| 8 | DS | 2013 | 1.54 |
| 9 | PC | 2011 | 35.03 |
With the first 10 rows visible, we can see that the 'DS' was at the tail end of it's lifecycle (2011 - 2013), 'passing the torch' to the next generation of console '3DS'. Both platforms would see sales concurrently (2011 - 2013) until titles were no longer being developed for the 'parent' console, or 'DS', giving way to the 'child' console, or '3DS'. This trend occurs accross multiple platforms in this data analysis, the only exception being 'PC', where no generational gaps can be distinguished.
# Sort platforms by total sales
#data_select_years['total_sales'] = data_select_years.sum(axis=1)
data_select_years = data_select_years.sort_values(by='total_sales')
display(data_select_years.head(20))
data_select_years.info()
| platform | year_of_release | total_sales | |
|---|---|---|---|
| 30 | PSP | 2015 | 0.12 |
| 42 | Wii | 2016 | 0.18 |
| 29 | PSP | 2014 | 0.24 |
| 15 | PS2 | 2011 | 0.45 |
| 41 | Wii | 2015 | 1.14 |
| 53 | X360 | 2016 | 1.52 |
| 8 | DS | 2013 | 1.54 |
| 28 | PSP | 2013 | 3.14 |
| 21 | PS3 | 2016 | 3.60 |
| 40 | Wii | 2014 | 3.75 |
| 36 | PSV | 2016 | 4.25 |
| 47 | WiiU | 2016 | 4.60 |
| 31 | PSV | 2011 | 4.63 |
| 14 | PC | 2016 | 5.25 |
| 35 | PSV | 2015 | 6.25 |
| 27 | PSP | 2012 | 7.69 |
| 13 | PC | 2015 | 8.52 |
| 39 | Wii | 2013 | 8.59 |
| 33 | PSV | 2013 | 10.59 |
| 7 | DS | 2012 | 11.01 |
<class 'pandas.core.frame.DataFrame'> Int64Index: 58 entries, 30 to 16 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 platform 58 non-null object 1 year_of_release 58 non-null int64 2 total_sales 58 non-null float64 dtypes: float64(1), int64(1), object(1) memory usage: 1.8+ KB
# Visualize top platforms
# Calculate year-over-year growth for each platform
# Your code here to calculate and visualize platform growth rates
#fig = px.histogram(data_select_years, x='platform', y='total_sales',
#color='year_of_release', barmode='group', height=500,
#labels={'total_sales' : 'Total Sales'})
#Visualizing platform performance by year, and changes in sales over time:
data_select_years = data_select_years.sort_values(by='year_of_release')
import plotly.express.colors as px_colors
fig = px.line(data_select_years, x='year_of_release', y='total_sales',
color='platform', symbol='platform', height=800,
labels={'total_sales' : 'Total Sales (USD Million)', 'year_of_release' : 'Year'},
title='Which Platform Sold the Most Titles from 2011 to 2016?',
color_discrete_sequence=px_colors.qualitative.Dark24)
fig.show()
I've chosen to use a line graph in order to showcase both the yearly standing, and changes over time for each console. With the amount of platforms on display here, it was vital to ensure visual clarity, by utilizing an external color pack and symbols.
As stated earlier, there are multiple generations of platform on display here. New hardware launches and there are a few years of simultaneous market activity in both, before the predeccessor is inevitably rendered obsolete when titles cease to launch. Some platforms have milder and more consistent market trends (Wii & WiiU) than the top selling platforms (PS3 & PS4), but loosely follow the same pattern of generational succession.
Reviewer's comment v1
Well done!
3.5 Sales Distribution Analysis¶
Let's examine the distribution of sales across platforms:
# Create box plot of sales by platform
fig = px.box(data_select_years, x='platform', y='total_sales',
color='platform', height=700, labels={'total_sales' : 'Total Sales (USD Million)', 'platform' : 'Platform'},
points='all', title='2011 to 2016: Annual Sales Distribution Analysis')
fig.show()
#display(data_select_years)
This box plot shows us a similar pattern, but in different format. With the older generations of platform beginning to trend downwards, the closer to 0 the min value is, the closer to the end of the platforms market life cycle. Sales are still reffering to games sold under each platform.
# Calculate detailed statistics for each platform
boxplot_summary_stats = data_select_years.groupby('platform')['total_sales'].describe()
display(boxplot_summary_stats)
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| platform | ||||||||
| 3DS | 6.0 | 42.968333 | 18.278261 | 15.14 | 31.7750 | 47.560 | 55.2675 | 63.20 |
| DS | 3.0 | 12.910000 | 12.429397 | 1.54 | 6.2750 | 11.010 | 18.5950 | 26.18 |
| PC | 6.0 | 16.280000 | 11.007403 | 5.25 | 9.4850 | 12.830 | 20.7350 | 35.03 |
| PS2 | 1.0 | 0.450000 | NaN | 0.45 | 0.4500 | 0.450 | 0.4500 | 0.45 |
| PS3 | 6.0 | 74.261667 | 60.695667 | 3.60 | 24.5550 | 77.560 | 111.7775 | 156.78 |
| PS4 | 4.0 | 78.535000 | 40.567917 | 25.99 | 58.4350 | 84.625 | 104.7250 | 118.90 |
| PSP | 5.0 | 5.802000 | 7.387558 | 0.12 | 0.2400 | 3.140 | 7.6900 | 17.82 |
| PSV | 6.0 | 8.968333 | 4.729695 | 4.25 | 5.0350 | 8.420 | 11.5725 | 16.19 |
| Wii | 6.0 | 15.836667 | 22.864073 | 0.18 | 1.7925 | 6.170 | 18.4300 | 59.65 |
| WiiU | 5.0 | 16.438000 | 7.068244 | 4.60 | 16.3500 | 17.560 | 21.6500 | 22.03 |
| X360 | 6.0 | 63.396667 | 56.074638 | 1.52 | 17.6550 | 61.660 | 96.9500 | 143.84 |
| XOne | 4.0 | 39.830000 | 20.313977 | 18.96 | 24.3525 | 40.110 | 55.5875 | 60.14 |
A more refined look at our data, but still the same message is conveyed: min values and their proximity to the mean tells us what kind of trajectory our platforms are on. 'PC' emerging in the 1990's and keeping with a consistent trend even up until 2016. On the other hand we have the 'PS3', the best performing console within this sample, but rapidly approaching zero, as the successor 'PS4' enters the market.
3.6 Review Score Impact Analysis¶
Select a popular platform and analyze how reviews affect sales:
# Choose a popular platform based on your previous analysis
popular_platform = data[(data['platform'] == 'PS3') & (data['year_of_release'] >= 2011)][['name', 'year_of_release', 'critic_score', 'user_score', 'total_sales']].dropna()
display(popular_platform)
| name | year_of_release | critic_score | user_score | total_sales | |
|---|---|---|---|---|---|
| 16 | Grand Theft Auto V | 2013 | 97.0 | 8.2 | 21.05 |
| 34 | Call of Duty: Black Ops II | 2012 | 83.0 | 5.3 | 13.79 |
| 37 | Call of Duty: Modern Warfare 3 | 2011 | 88.0 | 3.2 | 13.33 |
| 69 | Call of Duty: Ghosts | 2013 | 71.0 | 2.6 | 9.36 |
| 81 | FIFA Soccer 13 | 2012 | 88.0 | 6.6 | 8.17 |
| ... | ... | ... | ... | ... | ... |
| 15533 | Battle Princess of Arcadias | 2013 | 69.0 | 7.9 | 0.02 |
| 15567 | Tom Clancy's Splinter Cell Classic Trilogy HD | 2011 | 67.0 | 6.8 | 0.02 |
| 15742 | Invizimals: The Lost Kingdom | 2013 | 50.0 | 5.0 | 0.02 |
| 16146 | Don Bradman Cricket 14 | 2014 | 73.0 | 6.9 | 0.01 |
| 16230 | Madden NFL 13 | 2012 | 83.0 | 5.5 | 0.01 |
342 rows × 5 columns
The console that saw the most sales in our selective analysis was undoubtedly the 'PS3', with the largest mean/stdv pairing of all platforms. This lets us peer into the vastness of this platform and it's performative reach.
Since we're still under "3.x" I will continue the selected year trend to conduct further analysis until "4.0", or explicitly said otherwise.
As I mentioned at the beginning of this analysis, I will be omitting all entries with missing values in the event of 'critic_score' and 'user_score' analysis. As these entries have no value in this angle of observation.
# Create scatter plots for both critic and user scores
popular_platform_twinbox = popular_platform.melt(id_vars=['name', 'year_of_release', 'total_sales'],
value_vars=['critic_score', 'user_score'], var_name='score_type',
value_name='score')
popular_platform_twinbox['score'] = popular_platform_twinbox.apply(lambda row: row['score'] * 10 if row['score_type'] == 'user_score' else
row['score'], axis=1)
#display(popular_platform_twinbox)
fig = px.strip(popular_platform_twinbox, x='year_of_release', y='score',
color='score_type', facet_col='score_type', height=900, hover_data=['name'],
labels={'year_of_release' : 'Year', 'score' : 'Score (out of 100)'},
title='How Similar are the Review Scores of Critics and Users?')
fig.update_traces(jitter=1)
fig.show()
Scores between the two sources seem to be mildy similar at a glance, with some entries being within 5 points of each other (2013 'The Last of Us' at 95 vs 91), and some being over 10 points apart (2013 'Grand Theft Auto V' at 97 vs 82). There may be any number of reasons for the differing scores, but in my experience within this domain: critics tend to have different motives for playing vs users (duty vs leisure).
It's also worth considering that critics may receive the title for free before review, since their scores can influence the market. With this in mind, reviewing a game that costs nothing may positively skew the players experience (critic_score). While paying for the title sets an expectation of said product (user_score). This dynamic will not be true in every case, and could be one of many factors, but in the case of 'MindJack' released in 2011, this potentially explains the massive gap between a Critic Score of 44, and User Score of 17.
# Critic Scores
critic_scores = popular_platform['critic_score']
# User Scores
user_scores = popular_platform['user_score']
# Calculate correlations
score_correlation = critic_scores.corr(user_scores)
print(score_correlation)
0.5633145093196069
With a Correlation Coefficient of .56 between the two score types, we can say there is a moderate positive correlation. Critics and Users have similar taste, but not identical.
3.7 Cross-Platform Comparison¶
Compare sales performance of games across different platforms:
# Find games released on multiple platforms
cross_platform = data[(data['year_of_release'] >= 2011) & (data['total_sales'] >= 1.0)][['name', 'year_of_release', 'platform', 'total_sales']]
cross_platform = cross_platform[cross_platform['name'].duplicated(keep=False)]
display(cross_platform)
cross_platform.info()
| name | year_of_release | platform | total_sales | |
|---|---|---|---|---|
| 16 | Grand Theft Auto V | 2013 | PS3 | 21.05 |
| 23 | Grand Theft Auto V | 2013 | X360 | 16.27 |
| 29 | Call of Duty: Modern Warfare 3 | 2011 | X360 | 14.73 |
| 31 | Call of Duty: Black Ops 3 | 2015 | PS4 | 14.63 |
| 34 | Call of Duty: Black Ops II | 2012 | PS3 | 13.79 |
| ... | ... | ... | ... | ... |
| 2034 | Skylanders SWAP Force | 2013 | PS3 | 1.02 |
| 2036 | Sonic Generations | 2011 | 3DS | 1.01 |
| 2048 | Need for Speed: Most Wanted | 2012 | PSV | 1.01 |
| 2050 | Assassin's Creed: Rogue | 2014 | X360 | 1.01 |
| 2091 | Dark Souls | 2011 | X360 | 1.00 |
254 rows × 4 columns
<class 'pandas.core.frame.DataFrame'> Int64Index: 254 entries, 16 to 2091 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 254 non-null object 1 year_of_release 254 non-null int64 2 platform 254 non-null object 3 total_sales 254 non-null float64 dtypes: float64(1), int64(1), object(2) memory usage: 9.9+ KB
# Compare sales across platforms for these games
# Your code here to analyze and visualize cross-platform performance
fig = px.scatter(cross_platform, x='name', y='total_sales',
color='name', symbol='platform', height=800, size='total_sales',
labels={'total_sales' : 'Total Sales (USD Million)', 'name' : 'Game Title', 'platform' : 'Platform'},
title='How Well do Cross-Platform Games Perform?')
fig.show()
This is a ton of data points, and for the sake of visual clarity I've decided to ommit any games that did not cross 1 million usd in sales.
With that in mind, we can see that games releasing on multiple platforms tend to net more sales overall. With some of the largest titles releasing on consecutive generations of platform, which we can easily see thanks to plotly's wonderful legend layout.
Earlier platform popularity observations extend to this analysis, seeing that there is a heirarchy of which console sells the most games. 'PS3' will almost always lead, with 'X360' following close behind, and the next generations ('PS4' & 'XOne') making their debut, with some titles re-releasing on the newer hardware.
3.8 Genre Analysis¶
Finally, let's examine the distribution of games by genre:
# Analyze genre performance
genre_analysis = data[data['year_of_release'] >= 2011][['genre', 'total_sales']]
genre_analysis = genre_analysis.groupby('genre')['total_sales'].sum().reset_index()
display(genre_analysis)
genre_analysis.info()
| genre | total_sales | |
|---|---|---|
| 0 | Action | 557.80 |
| 1 | Adventure | 45.04 |
| 2 | Fighting | 66.90 |
| 3 | Misc | 139.42 |
| 4 | Platform | 88.68 |
| 5 | Puzzle | 9.85 |
| 6 | Racing | 88.31 |
| 7 | Role-Playing | 245.63 |
| 8 | Shooter | 402.91 |
| 9 | Simulation | 50.26 |
| 10 | Sports | 237.15 |
| 11 | Strategy | 22.16 |
<class 'pandas.core.frame.DataFrame'> RangeIndex: 12 entries, 0 to 11 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 genre 12 non-null object 1 total_sales 12 non-null float64 dtypes: float64(1), object(1) memory usage: 320.0+ bytes
# Sort genres by total sales
genre_analysis = genre_analysis.sort_values(by='total_sales', ascending=False)
display(genre_analysis)
| genre | total_sales | |
|---|---|---|
| 0 | Action | 557.80 |
| 8 | Shooter | 402.91 |
| 7 | Role-Playing | 245.63 |
| 10 | Sports | 237.15 |
| 3 | Misc | 139.42 |
| 4 | Platform | 88.68 |
| 6 | Racing | 88.31 |
| 2 | Fighting | 66.90 |
| 9 | Simulation | 50.26 |
| 1 | Adventure | 45.04 |
| 11 | Strategy | 22.16 |
| 5 | Puzzle | 9.85 |
# Visualize genre distribution
fig = px.bar(genre_analysis, x='genre', y='total_sales', height=600,
color='genre', labels={'total_sales' : 'Total Sales (USD Million)', 'genre' : 'Genre'},
title='Which Genre Sells the Most?')
fig.update_layout(showlegend=False)
fig.show()
# Calculate market share for each genre
sales_sum = genre_analysis['total_sales'].sum()
genre_analysis['market_share'] = ((genre_analysis['total_sales'] / sales_sum) * 100).map('{:.2f}%'.format)
display(genre_analysis)
#market_share_100 = genre_analysis['market_share'].sum()
#print(market_share_100)
| genre | total_sales | market_share | |
|---|---|---|---|
| 0 | Action | 557.80 | 28.54% |
| 8 | Shooter | 402.91 | 20.62% |
| 7 | Role-Playing | 245.63 | 12.57% |
| 10 | Sports | 237.15 | 12.14% |
| 3 | Misc | 139.42 | 7.13% |
| 4 | Platform | 88.68 | 4.54% |
| 6 | Racing | 88.31 | 4.52% |
| 2 | Fighting | 66.90 | 3.42% |
| 9 | Simulation | 50.26 | 2.57% |
| 1 | Adventure | 45.04 | 2.30% |
| 11 | Strategy | 22.16 | 1.13% |
| 5 | Puzzle | 9.85 | 0.50% |
Action games are leading the pack substantially, with Shooter trailing close behind. These 2 genres make up almost half (~49%) of the total gaming market recently (2011 - 2016) by themselves. Again, this is what we have been seeing in previous observations. 'Grand Theft Auto V' and the 'Call of Duty' series are the best selling titles in this sample, and having been released on multiple platforms accross multiple generations secures their market positions further.
Reviewer's comment v1
Great job!
Key Questions for Genre Analysis:¶
- Which genres consistently perform well?
- Are there any genres showing recent growth or decline?
- How does the average performance vary across genres?
Step 4: Regional Market Analysis and User Profiles¶
In this section, we will analyze the gaming market characteristics across three major regions: North America (NA), Europe (EU), and Japan (JP). Our analysis will focus on platform preferences, genre popularity, and the impact of ESRB ratings in each region.
4.1 Regional Platform Analysis¶
Let's begin by examining platform performance across different regions:
# Function to analyze platform performance by region
def regional_sales(data: pd.DataFrame, desired_region: str) -> pd.DataFrame:
selected = data[data['year_of_release'] >= 2011]
columns = ['platform', desired_region]
result = selected[columns].groupby('platform')[desired_region].sum().reset_index()
return result.sort_values(by=desired_region, ascending=False)
#jp_user = data[(data['year_of_release'] >= 2011) & (data['jp_sales'] >= 1.0)][['name', 'platform', 'year_of_release',
#'genre', 'jp_sales', 'critic_score',
#'user_score', 'rating']].sort_values(by='jp_sales')
#display(jp_user)
#jp_user.info()
#data.info()
#display(data)
# Analyze each region
regional_plotly = regional_sales(data, 'na_sales').merge(regional_sales(data, 'eu_sales'), on='platform', how='outer')
regional_plotly = regional_plotly.merge(regional_sales(data, 'jp_sales'), on='platform', how='outer')
regional_plotly = regional_plotly.rename(columns={'platform' : 'Platform', 'na_sales' : 'NA Sales', 'eu_sales' : 'EU Sales', 'jp_sales' : 'JP Sales'})
display(regional_plotly)
#display(regional_sales(data, 'na_sales'), regional_sales(data, 'eu_sales'), regional_sales(data, 'jp_sales'))
| Platform | NA Sales | EU Sales | JP Sales | |
|---|---|---|---|---|
| 0 | X360 | 226.80 | 117.10 | 3.35 |
| 1 | PS3 | 168.26 | 163.10 | 49.39 |
| 2 | PS4 | 108.74 | 141.09 | 15.96 |
| 3 | XOne | 93.12 | 51.59 | 0.34 |
| 4 | 3DS | 82.65 | 61.27 | 100.62 |
| 5 | Wii | 48.56 | 29.60 | 8.97 |
| 6 | WiiU | 38.10 | 25.13 | 13.01 |
| 7 | PC | 29.88 | 56.70 | 0.00 |
| 8 | DS | 18.86 | 9.24 | 8.04 |
| 9 | PSV | 12.47 | 13.07 | 21.84 |
| 10 | PSP | 2.19 | 1.77 | 24.04 |
| 11 | PS2 | 0.13 | 0.13 | 0.09 |
This is an exciting view into which regions gravitate towards certain platforms. I wasn't aware that X360 had so many more overall sales in North America, and I'm surprised at the distribution compared to Europe. It seems like the X360 and XOne just do not perform as well in EU and JP, as they do in NA. The same can be said about the handhelds ('3DS', 'PSV', 'PSP') and their performance in Japan vs other regions.
Cross-Regional Platform Comparison¶
Let's create a comparative analysis of platform performance across regions:
# Create a comparative platform analysis
# Visualize cross-regional comparison for top platforms
fig = px.histogram(regional_plotly, x='Platform', y=['NA Sales', 'EU Sales', 'JP Sales'],
barmode='group', height=700, labels={'value' : 'Total Sales (USD Million)', 'variable' : 'Region'},
title='Regional Market Analysis on Platforms')
fig.show()
#display(regional_plotly)
4.2 Regional Genre Analysis¶
Now let's examine genre preferences across regions:
# Function to analyze genre performance by region
def regional_genre(data: pd.DataFrame, desired_region: str) -> pd.DataFrame:
selected = data[data['year_of_release'] >= 2011]
columns = ['genre', desired_region]
result = selected[columns].groupby('genre')[desired_region].sum().reset_index()
return result.sort_values(by=desired_region, ascending=False)
regional_genre_ex = regional_genre(data, 'na_sales').merge(regional_genre(data, 'eu_sales'), on='genre', how='outer')
regional_genre_ex = regional_genre_ex.merge(regional_genre(data, 'jp_sales'), on='genre', how='outer')
regional_genre_ex = regional_genre_ex.rename(columns={'genre' : 'Genre', 'na_sales' : 'NA Sales', 'eu_sales' : 'EU Sales', 'jp_sales' : 'JP Sales'})
display(regional_genre_ex)
| Genre | NA Sales | EU Sales | JP Sales | |
|---|---|---|---|---|
| 0 | Action | 230.92 | 199.12 | 63.62 |
| 1 | Shooter | 194.43 | 148.02 | 11.92 |
| 2 | Sports | 108.70 | 88.48 | 11.09 |
| 3 | Role-Playing | 83.78 | 62.30 | 79.87 |
| 4 | Misc | 69.69 | 39.72 | 17.60 |
| 5 | Platform | 38.28 | 30.07 | 12.13 |
| 6 | Racing | 31.98 | 40.58 | 5.64 |
| 7 | Fighting | 31.63 | 16.24 | 12.38 |
| 8 | Simulation | 15.37 | 19.72 | 11.64 |
| 9 | Adventure | 14.76 | 13.96 | 12.07 |
| 10 | Strategy | 7.48 | 8.02 | 4.78 |
| 11 | Puzzle | 2.74 | 3.56 | 2.91 |
Cross-Regional Genre Comparison¶
Let's compare genre preferences across regions:
# Create a comparative genre analysis
fig = px.histogram(regional_genre_ex, x='Genre', y=['NA Sales', 'EU Sales', 'JP Sales'], barmode='group',
height=600, labels={'variable' : 'Region', 'value' : 'Total Sales (USD Million)'},
title='How do Genres Perform in each Market Region?')
fig.show()
4.3 ESRB Rating Impact Analysis¶
Finally, let's examine how ESRB ratings affect sales in each region:
# Function to analyze ESRB rating impact
def regional_rating(data: pd.DataFrame, desired_region: str) -> pd.DataFrame:
selected = data[data['year_of_release'] >= 2011]
columns = ['rating', desired_region]
result = selected[columns].groupby('rating')[desired_region].sum().reset_index()
return result.sort_values(by=desired_region, ascending=False)
#data.info()
#display(data)
# Analyze ESRB impact for each region
regional_rating_ex = regional_rating(data, 'na_sales').merge(regional_rating(data, 'eu_sales'), on='rating', how='outer')
regional_rating_ex = regional_rating_ex.merge(regional_rating(data, 'jp_sales'), on='rating', how='outer')
regional_rating_ex = regional_rating_ex.rename(columns={'rating' : 'Rating', 'na_sales' : 'NA Sales', 'eu_sales' : 'EU Sales', 'jp_sales' : 'JP Sales'})
display(regional_rating_ex)
fig = px.histogram(regional_rating_ex, x='Rating', y=['NA Sales', 'EU Sales', 'JP Sales'],
barmode='group', height=600, labels={'variable' : 'Region', 'value' : 'Total Sales (USD Million)'},
title='What Impact Does Rating Have on Sales?')
fig.show()
| Rating | NA Sales | EU Sales | JP Sales | |
|---|---|---|---|---|
| 0 | M | 308.45 | 250.26 | 26.14 |
| 1 | E | 170.17 | 141.73 | 36.68 |
| 2 | E10+ | 117.72 | 78.00 | 11.23 |
| 3 | T | 109.31 | 83.16 | 33.39 |
| 4 | EC | 0.42 | 0.00 | 0.00 |
| 5 | RP | 0.00 | 0.03 | 0.00 |
*people like a Mature game*
'Grand Theft Auto V' & 'Call of Duty' really rocked the gaming market hard in favor of Action and Shooter games rated M. With Japan consuming substantially less 'M'/'Mature' content, and even consuming more titles rated 'E'/'Everyone'. This also makes sense, since a lot of handheled titles are rated 'E'/'Everyone', and Japan leads in handheld platform consumption.
Reviewer's comment v1
Well done!
Step 5 : Hypothesis Tests¶
—Average user ratings of the Xbox One and PC platforms are the same.
—Average user ratings for the Action and Sports genres are different.
Set the alpha threshold value yourself.
Explain:
—How you formulated the null and alternative hypotheses
—What criteria you used to test the hypotheses, and why
#1.Average user ratings of the XOne == PC
#Null-Hypothesis: XOne['user_rating'].mean() == PC['user_rating'].mean()
#Alternate-Hypothesis: XOne['user_rating'].mean() != PC['user_rating'].mean()
xone_avg = data[(data['year_of_release'] >= 2011) & (data['platform'] == 'XOne')]['user_score'].dropna()
pc_avg = data[(data['year_of_release'] >= 2011) & (data['platform'] == 'PC')]['user_score'].dropna()
alpha = .05
results = st.ttest_ind(xone_avg, pc_avg, equal_var=False)
print(f'p-value: ', results.pvalue)
if results.pvalue < alpha:
print(f'We reject the Null Hypothesis')
else:
print(f'We cannot reject the Null Hypothesis')
p-value: 0.6130712247638477 We cannot reject the Null Hypothesis
*Null-Hypothesis: XOne['user_rating'].mean() == PC['user_rating'].mean()*
*Alternate-Hypothesis: XOne['user_rating'].mean() != PC['user_rating'].mean()*
With a statistical significance of 5% or .05, there is not sufficient evidence to reject the null hypothesis. We can conclude that the average user ratings of XOne and PC platforms are the same.
I chose to use 2 series objects containing the user scores of all games from both platforms respectively, and staying true to our relevant time period of 2011 - 2016. This allowed me to test the means from two statistical populations (platform) that cannot be assumed to share a statistical variance (XOne vs PC).
#Average user ratings of Action != Sports
#Null-Hypothesis: Action['user_rating'].mean() != Sports['user_rating'].mean()
#Alternate-Hypothesis: Action['user_rating'].mean() == Sports['user_rating'].mean()
action_avg = data[(data['year_of_release'] >= 2011) & (data['genre'] == 'Action')]['user_score'].dropna()
sports_avg = data[(data['year_of_release'] >= 2011) & (data['genre'] == 'Sports')]['user_score'].dropna()
alpha = .05
results = st.ttest_ind(action_avg, sports_avg, equal_var=False)
print(f'p-value: ', results.pvalue)
if results.pvalue < alpha:
print(f'We reject the Null Hypothesis')
else:
print(f'We cannot reject the Null Hypothesis')
#display(action_avg, sports_avg)
#data.info()
#display(data)
p-value: 8.285648549432565e-19 We reject the Null Hypothesis
*Null-Hypothesis: Action['user_rating'].mean() != Sports['user_rating'].mean()*
*Alternate-Hypothesis: Action['user_rating'].mean() == Sports['user_rating'].mean()*
With a statistical significance of 5% or .05, there is sufficient evidence to reject the null hypothesis. We can conclude that the average user ratings of Action and Sports genres are the same.
**I chose to use 2 series objects containing the user scores of all games from both genres respectively ('Action' & 'Sports'), and staying true to our relevant time period of 2011 - 2016. This allowed me to test the means from two statistical populations (genre) that cannot be assumed to share a statistical variance ('Action' vs 'Sports').**
Reviewer's comment v1
Well done!
Step 6. Write a general conclusion¶
*Remarkable observations:*
1. Many titles are missing some combination of [user/critic/rating], with no observable pattern.
2. 'PC' is a veteran, emerging in 1985, and staying consistent through time into the modern era (2011 - 2016)
3. 'PC' can almost be considered the main artery in which the others branch from, and no generational successor in sight due to missing hardware 'constraints'. A constantly evolving platform, while time is a concept only to the others.
4. 'PlayStation' and 'XBox' would spear-head a surge into the gaming market, with 'Playstation' seeing 100 Million USD in sales within 4 years of launching, and 'XBox' within 8 years.
5. Our relevant time period (2011 - 2016) was selected based on observational trajectory in total number of titles being released, which inevitably impacts the rate at which developers make profit. Probable causes include developement time demand, scale of projects, etc.
6. Our data represents the 'passing of the torch', or a period in time where titles are being released on generationally different hardware ['PS3' -> 'PS4'], maybe as a way to give users the time to make the hardware change, a grace period if you will.
7. Handhelds and Consoles follow similar trends, in the interesting case of '3DS' showing an increase in sales when titles stopped releasing for the previous generation 'DS'. Probable cause is removal of hardware/software computational power difference constraints. Innovating ideas bring increase in sales. We have yet to see if the same goes for the consoles, likely.
8. 'PS3' and 'X360' struck gold with the release of 'Grand Theft Auto V' and 'Call of Duty', setting the Action & Shooter genres to be leagues ahead of others in sales.
9. Japan consumes significantly less rated 'M'/'Mature' content than the other regions [North America, Europe]. With Japan even consuming more content rated 'E'/'Everyone', in correlation to the '3DS' handheld reaching 100 Million USD in title sales, doubling their runner up 'PS3'.
10. Action and Shooter titles have gapped the other genres with similar distributions between both North America and Europe, with Role-Playing as Japan's highest selling genre.
11. 'Mature' games have a significant market lead, with 'Everyone' trailing at about 55% the same market saturation.
12. The average user ratings of [XOne/PC] and [Action/Sports] are the same.
Reviewer's comment v1
You did an excellent job analyzing the data and drawing conclusions!
It’s great that you pay attention to formulating interim conclusions in your project-this is an excellent practice that will benefit you in future projects. Keep using this approach, as it makes your analysis more structured and easier for readers and colleagues to understand.
Additionally, it’s fantastic that you included valuable insights about trends in the gaming industry. You did a great job highlighting key points and interpreting them clearly.
Overall, excellent work! Keep it up! 😊